Search CORE

21 research outputs found

SYNTHESIZING DYSARTHRIC SPEECH USING MULTI-SPEAKER TTS FOR DSYARTHRIC SPEECH RECOGNITION

Author: Soleymanpour Mohammad
Publication venue: UKnowledge
Publication date: 01/01/2022
Field of study

Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems may help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech is required, which is not readily available for dysarthric talkers. In this dissertation, we investigate dysarthric speech augmentation and synthesis methods. To better understand differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels, a comparative study between typical and dysarthric speech was conducted. These characteristics are important components for dysarthric speech modeling, synthesis, and augmentation. For augmentation, prosodic transformation and time-feature masking have been proposed. For dysarthric speech synthesis, this dissertation has introduced a modified neural multi-talker TTS by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. In addition, we have extended this work by using a label propagation technique to create more meaningful control variables such as a continuous Respiration, Laryngeal and Tongue (RLT) parameter, even for datasets that only provide discrete dysarthria severity level information. This approach increases the controllability of the system, so we are able to generate more dysarthric speech with a broader range. To evaluate their effectiveness for synthesis of training data, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has a significant impact on the dysarthric ASR systems

University of Kentucky

Accurate synthesis of Dysarthric Speech for ASR data augmentation

Author: Berry Jeffrey
Johnson Michael T.
Soleymanpour Mohammad
Soleymanpour Rahim
Publication venue
Publication date: 16/08/2023
Field of study

Dysarthria is a motor speech disorder often characterized by reduced speech intelligibility through slow, uncoordinated control of speech production muscles. Automatic Speech recognition (ASR) systems can help dysarthric talkers communicate more effectively. However, robust dysarthria-specific ASR requires a significant amount of training speech, which is not readily available for dysarthric talkers. This paper presents a new dysarthric speech synthesis method for the purpose of ASR training data augmentation. Differences in prosodic and acoustic characteristics of dysarthric spontaneous speech at varying severity levels are important components for dysarthric speech modeling, synthesis, and augmentation. For dysarthric speech synthesis, a modified neural multi-talker TTS is implemented by adding a dysarthria severity level coefficient and a pause insertion model to synthesize dysarthric speech for varying severity levels. To evaluate the effectiveness for synthesis of training data for ASR, dysarthria-specific speech recognition was used. Results show that a DNN-HMM model trained on additional synthetic dysarthric speech achieves WER improvement of 12.2% compared to the baseline, and that the addition of the severity level and pause insertion controls decrease WER by 6.5%, showing the effectiveness of adding these parameters. Overall results on the TORGO database demonstrate that using dysarthric synthetic speech to increase the amount of dysarthric-patterned speech for training has significant impact on the dysarthric ASR systems. In addition, we have conducted a subjective evaluation to evaluate the dysarthric-ness and similarity of synthesized speech. Our subjective evaluation shows that the perceived dysartrhic-ness of synthesized speech is similar to that of true dysarthric speech, especially for higher levels of dysarthriaComment: arXiv admin note: text overlap with arXiv:2201.1157

arXiv.org e-Print Archive

Bilingual Streaming ASR with Grapheme units and Auxiliary Monolingual Loss

Author: Bahmaninezhad Fahimeh
Ismail Mahmoud Al
Kumar Kshitiz
Soleymanpour Mohammad
Wu Jian
Publication venue
Publication date: 11/08/2023
Field of study

We introduce a bilingual solution to support English as secondary locale for most primary locales in hybrid automatic speech recognition (ASR) settings. Our key developments constitute: (a) pronunciation lexicon with grapheme units instead of phone units, (b) a fully bilingual alignment model and subsequently bilingual streaming transformer model, (c) a parallel encoder structure with language identification (LID) loss, (d) parallel encoder with an auxiliary loss for monolingual projections. We conclude that in comparison to LID loss, our proposed auxiliary loss is superior in specializing the parallel encoders to respective monolingual locales, and that contributes to stronger bilingual learning. We evaluate our work on large-scale training and test tasks for bilingual Spanish (ES) and bilingual Italian (IT) applications. Our bilingual models demonstrate strong English code-mixing capability. In particular, the bilingual IT model improves the word error rate (WER) for a code-mix IT task from 46.5% to 13.8%, while also achieving a close parity (9.6%) with the monolingual IT model (9.5%) over IT tests

arXiv.org e-Print Archive

Comparison of SINR calculation for conventional MVDR, PSO-MVDR, GSA-MVDR, SLGSA-MVDR [28] and ECGSA-MVDR for user at 0° and interference at 30°.

Author: Hassan Rezai Soleymanpour (3155487)
Mohammad Tariqul Islam (584841)
Salehin Kibria (823905)
Sieh Kiong Tiong (823904)
Soodabeh Darzi (823903)
Publication venue
Publication date
Field of study

Comparison of SINR calculation for conventional MVDR, PSO-MVDR, GSA-MVDR, SLGSA-MVDR [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0156749#pone.0156749.ref028" target="_blank">28</a>] and ECGSA-MVDR for user at 0° and interference at 30°.</p

FigShare

Comparison of performance of power response with 100 iterations for user at 0°with two interferences at 30° and 50°.

Author: Hassan Rezai Soleymanpour (3155487)
Mohammad Tariqul Islam (584841)
Salehin Kibria (823905)
Sieh Kiong Tiong (823904)
Soodabeh Darzi (823903)
Publication venue
Publication date
Field of study

(a) MVDR, (b) PSO-MVDR, (c) GSA-MVDR, (d) SLGSA-MVDR [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0156749#pone.0156749.ref028" target="_blank">28</a>] and (e) ECGAS-MVDR.</p

FigShare

An Experience Oriented-Convergence Improved Gravitational Search Algorithm for Minimum Variance Distortionless Response Beamforming Optimum

Author: Hassan Rezai Soleymanpour (3155487)
Mohammad Tariqul Islam (584841)
Salehin Kibria (823905)
Sieh Kiong Tiong (823904)
Soodabeh Darzi (823903)
Publication venue
Publication date: 11/07/2016
Field of study

<div>An experience oriented-convergence improved gravitational search algorithm (ECGSA) based on two new modifications, searching through the best experiments and using of a dynamic gravitational damping coefficient (α), is introduced in this paper. ECGSA saves its best fitness function evaluations and uses those as the agents’ positions in searching process. In this way, the optimal found trajectories are retained and the search starts from these trajectories, which allow the algorithm to avoid the local optimums. Also, the agents can move faster in search space to obtain better exploration during the first stage of the searching process and they can converge rapidly to the optimal solution at the final stage of the search process by means of the proposed dynamic gravitational damping coefficient. The performance of ECGSA has been evaluated by applying it to eight standard benchmark functions along with six complicated composite test functions. It is also applied to adaptive beamforming problem as a practical issue to improve the weight vectors computed by minimum variance distortionless response (MVDR) beamforming technique. The results of implementation of the proposed algorithm are compared with some well-known heuristic methods and verified the proposed method in both reaching to optimal solutions and robustness.</div

Directory of Open Access Journals

FigShare

The simplified flowchart of ECGSA beamforming.

Author: Hassan Rezai Soleymanpour (3155487)
Mohammad Tariqul Islam (584841)
Salehin Kibria (823905)
Sieh Kiong Tiong (823904)
Soodabeh Darzi (823903)
Publication venue
Publication date
Field of study

The simplified flowchart of ECGSA beamforming.</p

FigShare

Mass acceleration toward the result force in GSA [20].

Author: Hassan Rezai Soleymanpour (3155487)
Mohammad Tariqul Islam (584841)
Salehin Kibria (823905)
Sieh Kiong Tiong (823904)
Soodabeh Darzi (823903)
Publication venue
Publication date
Field of study

Mass acceleration toward the result force in GSA [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0156749#pone.0156749.ref020" target="_blank">20</a>].</p

FigShare

Comparison of weight vectors for conventional MVDR, PSO-MVDR, GSA-MVDR, SLGSA-MVDR [28] and ECGSA-MVDR for user at 0° and interferences at 30°and 50°.

Author: Hassan Rezai Soleymanpour (3155487)
Mohammad Tariqul Islam (584841)
Salehin Kibria (823905)
Sieh Kiong Tiong (823904)
Soodabeh Darzi (823903)
Publication venue
Publication date
Field of study

Comparison of weight vectors for conventional MVDR, PSO-MVDR, GSA-MVDR, SLGSA-MVDR [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0156749#pone.0156749.ref028" target="_blank">28</a>] and ECGSA-MVDR for user at 0° and interferences at 30°and 50°.</p

FigShare

Composition tests functions.

Author: Hassan Rezai Soleymanpour (3155487)
Mohammad Tariqul Islam (584841)
Salehin Kibria (823905)
Sieh Kiong Tiong (823904)
Soodabeh Darzi (823903)
Publication venue
Publication date
Field of study

Composition tests functions.</p

FigShare